A Deep Bag-of-Features Model for Music Auto-Tagging

نویسندگان

  • Juhan Nam
  • Jorge Herrera
  • Kyogu Lee
چکیده

Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms. Such interest has grown in the area of music information retrieval (MIR) as well, particularly in music audio classification tasks such as auto-tagging. In this paper, we present a twostage learning model to effectively predict multiple labels from music audio. The first stage learns to project local spectral patterns of an audio track onto a high-dimensional sparse space in an unsupervised manner and summarizes the audio track as a bag-of-features. The second stage successively performs the unsupervised learning on the bag-of-features in a layer-by-layer manner to initialize a deep neural network and finally fine-tunes it with the tag labels. Through the experiment, we rigorously examine training choices and tuning parameters, and show that the model achieves high performance on Magnatagatune, a popularly used dataset in music auto-tagging.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms

Recently, the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains. This approach was applied to musical signals as well but has been not fully explored yet. To this end, we propose sample-level deep convolutional neural networks which learn representations from ve...

متن کامل

Mirex 2010 Audio Tag Classification via a Bag of Systems Representation

This paper describes an auto-tagging system presented to MIREX 2011 that represents a “Bag of Systems” (BoS) representation of music. Similar to the Bag of Words representation for text documents, the BoS representation uses a dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music. Songs are represented as a BoS his...

متن کامل

Semantic Annotation and Retrieval of Music using a Bag of Systems Representation

We present a content-based auto-tagger that leverages a rich dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music. This leads to a higher-level, concise “Bag of Systems” (BoS) representation of the characteristics of a musical piece. Once songs are represented as a BoS histogram over codewords, traditional algorit...

متن کامل

Codebook-based Scalable Music Tagging with Poisson Matrix Factorization

Automatic music tagging is an important but challenging problem within MIR. In this paper, we treat music tagging as a matrix completion problem. We apply the Poisson matrix factorization model jointly on the vector-quantized audio features and a “bag-of-tags” representation. This approach exploits the shared latent structure between semantic tags and acoustic codewords. Leveraging the recently...

متن کامل

Improving Auto-tagging by Modeling Semantic Co-occurrences

Automatic taggers describe music in terms of a multinomial distribution over relevant semantic concepts. This paper presents a framework for improving automatic tagging of music content by modeling contextual relationships between these semantic concepts. The framework extends existing auto-tagging methods by adding a Dirichlet mixture to model the contextual co-occurrences between semantic mul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1508.04999  شماره 

صفحات  -

تاریخ انتشار 2015